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Abstract. We review recent progress in the study of the vertex-cover problem 
(VC). VC belongs to the class of NP-complete graph theoretical problems, which 
plays a central role in theoretical computer science. On ensembles of random 
graphs, VC exhibits an coverable-uncoverable phase transition. Very close to 
this transition, depending on the solution algorithm, easy-hard transitions in the 
typical running time of the algorithms occur. 

We explain a statistical mechanics approach, which works by mapping VC to 
a hard-core lattice gas, and then applying techniques like the replica trick or the 
cavity approach. Using these methods, the phase diagram of VC could be obtained 
exactly for connectivities c < e, where VC is replica symmetric. Recently, this 
result could be confirmed using traditional mathematical techniques. For c > e, 
the solution of VC exhibits full replica symmetry breaking. 

The statistical mechanics approach can also be used to study analytically 
the typical running time of simple complete and incomplete algorithms for VC. 
Finally, we describe recent results for VC when studied on other ensembles of 
finite- and infinite-dimensional graphs. 
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1. Introduction 

Starting in the 80s of the last century, there are growing relations between the fields 
of statistical physics and (theoretical) computer science. This is true in particular for 
the study of disordered glassy systems in physics and the research on optimization 
problems in computer science [I]. Both fields can profit strongly from each other. 
In one way computer science helps physics: Recently developed efficient optimization 
algorithms [2] help to study the low-temperature behavior of physical models, like 
spin glasses, random field systems or solid-on-solid models. On the other hand 
also developments in statistical physics have helped to develop or improve existing 
optimization algorithms. The most prominent example is the invention of the 
simulated annealing method which has been applied to a variety of optimization 
problems. 

In recent years another variant of how physics can help computer science has 
emerged. Computational problems can be sorted into different classes. From the 
viewpoint of a person wanting to solve problems, a very convenient class is the class 
P: It collects all problems which can be solved on a computer in a running time, which 
grows even in the worst case only polynomially with the size of the problem. These 
problems are called easy. In theoretical computer science ^ El these problems are 
analyzed using model computers, e.g. the Turing machine (TM) [Jj. A deterministic 
TM can solve the same problems like a conventional (von Neumann) computer. But 
not all problems can be solved polynomially. There are problems, for which for 
sure no polynomial algorithm exists. These problems are called hard. But most 
of these problems have only academic applications. The most interesting problems lie 
on the interface between polynomial and exponential running time. They belong 
to the class of nondeterministic polynomial problems (NP) This means that 

a nondeterministic TM can solve any problem from NP in polynomial time. This 
works in the following way: First, the nondeterministic abilities of the TM are used 
to generate a solution. Then the TM proves deterministically that the solution is 
correct. For purely deterministic computers, all algorithms for solving problems from 
NP known so far need in the worst case an exponentially growing running time. Hence, 
it appears that the problems from NP are hard as well. But so far there is no proof 
that the problems from NP are indeed hard. This is the so called P-NP problem, one 
of the great open questions in computer science %. Expressed in colloquial language 
we have to answer the question: "What is it that makes a problem hard ?" 

A notable advance [HI EII towards the answer of this question has recently been 
achieved by realizing that worst case and typical case are different. This means that 
for some problems there are ensembles of problems which can be solved typically in 
polynomial time, while the worst case is still exponential. In particular, there are 
suitably parametrized ensembles of random problem instances, where in one region of 
parameter space the instances are easy while in another region the instances are hard 
The typically hardest to solve instances are often found at the boundaries 
separating these regions. The effects found at the boundaries have much in common 
with phase transitions in physical systems |13[ll4j . Recently methods from statistical 
physics 15^, like the replica trick or the cavity approach, have been applied to classical 
problems from computer science. The most prominent one is the satisfiability problem 
(SAT) [HI . SAT is the most famous and central of all problems in theoretical computer 

X The Clay Mathematics Institute of Cambridge, Massachusetts (CMI) has designated a $1 miUion 
price for the solution of the P-NP problem. 
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science: In 1971, it was the first one which was shown by Cook ^] to be NP- 
complete, which means that all problems from NP can be mapped onto SAT using 
polynomial algorithms. Hence, SAT is at least as hard as any problem in NP. Using 
the statistical mechanics approach it is possible to obtain results which have not been 
found before using classical mathematical methods |17[I18[IT^ I20| . Furthermore this 
approach allows to invent new algorithms which are sometimes substantially faster 
than previously know algorithms |21j . 

In this paper, we review the recent progress in the field by concentrating on 
the vertex-cover problem (VC), which belongs to the six "classical" NP-complete 
problems in theoretical computer science VC is a problem defined on graphs. 
We first introduce VC and show that it is NP-complete. Then we present some 
algorithms which can be used to solve NP. In the succeeding section, we present results 
characterizing the phase transition, which occurs when studying VC on ensembles of 
random graphs. Next, we describe the results obtained for the phase diagram using 
statistical mechanics methods. In section six we show how the typical running time of 
algorithms can be analyzed analytically. Next, we consider other ensembles of random 
graphs, especially scale-free graphs and graphs consisting of a collection of connected 
cliques. Finally, we summarize and give an outlook. 

2. The vertex-cover problem 

In this section, we will introduce the terminology, show that VC is NP-complete and 
review some rigorous results about vertex cover which have been obtained previously 
by applying mathematical techniques. 

2.1. Vertex cover and related problems 

Let us start with the definition of vertex covers. We consider a graph G — {V, E) with 
N vertices i e {1, 2, and undirected edges € E <Z V x V connecting pairs 

of vertices. Please note that and {j, ?} both denote the same edge. 

Definition 1: A vertex cover Vyc is a subset V^c <Z V of vertices such that for all 
edges {j, j} & E at least one of the endpoints is in V^c, i-e- i G V^c or j S V^c- 

Later on also subsets V are considered, which are not covers. Anyway, we call all 
vertices in V' covered, all others uncovered. Also edges from E n {\V' xV]^\V x V']) 
are called covered. This means that V' is a vertex cover, iff all edges are covered. 

There are three different variants of VC: 

PI The minimal uertei-couer problem, which consists in finding a vertex cover Y^c of 
minimal cardinality, and calculate the minimal fraction Xc{G) =^ |K;c|/A^ needed 
to cover the whole graph. 

P2 The decision variant of this problem is: "Given a number X = xN, is there a 
vertex cover Vvc of size X?" . 

P3 If there is no vertex cover of size X, one can study the related optimization 
problem: Find a set V' with \V'\ = X which minimizes the number of uncovered 
edges. In other words, we try to distribute X covering marks on the N vertices 
in an optimal way, such that the following energy of configurations is minimized: 

E(G,x) = minjnumber of uncovered edges when covering xN vertices} (1) 

This means, the graph is coverable using X = xN vertices iff the ground state 
energy is zero. 
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VC is equivalent to other problems: 

• An independent set is a subset of vertices which are pairwise disconnected in the 
graph G. Due to the above-mentioned properties, any set V \ V^c thus forms an 
independent set, and maximal independent sets are complementary to minimal 
vertex covers. 

• A clique is a fully connected subset of vertices, and thus an independent set in the 
complementary graph G where vertices i and j are connected whenever {i, j} ^ E 
and vice versa. 

2.2. NP- completeness 

Here, we show the NP-completeness of VC "S*. For this purpose, we first introduce the 
3-satisfiability problem (3-SAT), which is know to be NP-complete. Then we show 
how 3-SAT can be mapped onto VC in polynomial time. 

3-SAT is a problem concerning Boolean formulas. A Boolean formula F m K — 3 
conjunctive normal form (CNF) has the following structure: It is a formula over N 
boolean variables {xi, X2, . ■ . , xn} which contains M clauses Gi. F — Gi A C2 /\ . . . A 
Cm- Each clause is a disjunction of three literals Gp = IpV IpV Ip, where each literal 
is either a variable = xj ) or a negated variable (Z* = xj) . The 3-SAT problem is: 

"Given a 3-CNF formula F, is there an assignment of the variables 
{xi, . . . ,xn} G {0, 1}^ such that F evaluates to true, i.e., is F satisfiahlel " 

3-SAT is a special variant of SAT and has been proven to be NP-complete before 
The proof of the NP-completeness of VC works by reducing 3-SAT to VC in 
polynomial time. 

First, we show VCsNP: It is very easy to decide for a given subset V' of vertices, 
whether all edges are covered, i.e. whether V' is a vertex cover, by just iterating over 
all edges. 

Hence, it remains to show that 3-SAT is polynomially reducible to VC, (one writes 
3-SAT <p VC). 

Let F = Ci A . . . A Cm be a 3-SAT formula with variables X — {xi, . . . , a;„} and 
Cp = llyll\J ll for all p. 

We have to create a graph G and a threshold K, such that G has a VC of size 
lower than or equal to K , iff F is satisfiable. For this purpose, we set: 

• Vi = {vi,Vi,. . . ,Vn,Vn}, [\Vi\ = 2n) and Ei = {{vi,Vi},{v2,V2}, . . . ,{Vn,Vn}}, 

i.e. for each variable occurring in F we create a pair of vertices and an edge 
connecting it. 

To cover the edges in Ei, we have to include at least one vertex per pair in the 
covering set. In this part of the graph, each cover corresponds to an assignment 
of the variables with the following idea behind it: If for variable Xi = 1, then Vi 
should be covered, while if Xi = then Vi is to be covered. It will become clear 
soon, why this correspondence has been chosen. 

• For each clause in F we introduce three vertices connected in form of a triangle: 
V2 = {a\,a\,a\,a\,a\,a\, . . .a\^,a\^,a\^} and £^2 = {{a}, a?}, {a?,a?}, {a?, a}}, 
{a\,a\}, {al a^}, {a^, a^}, . . . , {a^ , a^}, {a^ , }, {a^ , }}, 

Per triangle, i.e. per clause, we have to include at least two vertices in a VC. 
We intent that in a cover of minimum size, the uncovered vertex corresponds to 
a literal which is satisfied. This will be induced by the edges generated in the 
following. 
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• Finally, for each position i in a clause p, vertex is connected with the 
vertex representing the literal appearing at that position of the clause: £'3 = 
{Wp,Vj}\p = l,...,m,i = 1,2,3 if = Xj} U {{al,Vj}\p = l,...,m,i = 
1,2,3 if /* = Xj}. Hence, £'3 contains edges each connecting one vertex from 
Vi with one vertex from V2. 

• The graph G is the combination of the above introduced vertices and edges: 
G = {V, E), V = Vi\JV2,E = Ei\JE2lJ E3. 

• The size of the vertex cover to be constructed is set to A' = n + 2m. 

In the following example, we show how the transformation works for a small 
3-SAT formula: 

Example We consider F = {xi Vx^V X4) A {xi V 2:2, 2:4). The resulting graph 
G{V, E) is displayed in Fig. [T] 




Figure 1. VC instance resulting from the 3-SAT instance F = (x\ V X3 V X4) A 
(xi V X2 V X4). 

The number of vertices generated by this transformation is 0{n + m), i.e. linear 
in the sum of the number of clauses and the number of variables of F. Since the 
number of variables is bounded by three times the number clauses, the construction 
of the graph is linear in the length of F, i.e. in particular polynomial. It remains to 
show: F satisfiable if and only if there exists a vertex cover V' of G with size \ V'\ < K. 

Now let F be satisfiable and {Xi},Xi = 0, 1 a satisfying assignment. We set 
V{ — {vi\Xi = 1} U = 0}. Obviously \V{\ = n and all edges in Ei are covered. 

For each clause Cp, since it is satisfied by {X^}, there is one satisfied literal fp^\ We 
set V2 = {ciplp = 1, . . . ,m; i ^{p)}- We have included 2 vertices per clause in V2 (by 
excluding Op^''), i.e. 2 vertices per triangle in i?2. Thus, I = 2m and all edges of 
E2 are covered. Furthermore, since tp^"^ is satisfied, the vertex corresponding to the 
literal is in Vi, hence all edges contained in E3 are covered as well. To summarize 
V' = V{ U is a VC of G and \V'\ =n + 2m<K. 

Conversely, let be 1^' C F be a VC of G and \V'\ < K. Since a VC must include 
at least one vertex per edge from Ei and at least two vertices per triangle from E2, 
we know \V'\ > n + 2m = hence we have \V'\ = K, i.e. exactly one vertex per 
pair Xi,Xi and exactly two vertices per triplet Op, a^, is included in V' . Now we set 
Xi = 1 if Xi G V and Xi — Q \i Xi . Since each triangle (each corresponding to a 
clause), has one vertex ap(p) , we know that the vertex from Vi connected with 
it is covered. Hence, the literal corresponding to this vertex is satisfied. Therefore, 
for each clause, we have a satisfied literal, hence F is satisfied and {Xi} is a satisfying 
assignment. 
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2. 3. Vertex covers of random graphs 

In order to speak of median or average cases, and of phase transitions, we have to 
introduce a probabiUty distribution over graphs. This can be done best by using 
the concept of random graphs as already introduced about 40 years ago by Erdos and 
Renyi [221 ■ A random graph Gn,p is a graph with N vertices V = {1, N}, where any 
pair of vertices is connected randomly and independently by an edge with probability 
p. So the expected number of edges becomes p(^) = pN'^/2 + 0{N), and the average 
connectivity of a vertex equals p{N — 1). 

We are interested in the large- limit of finite- connectivity graphs, where p = c/N 
with constant c. Then the average connectivity c + 0{N~^) stays finite. In this case, 
we also expect the size of minimal vertex covers to depend only on c, Xc{G) = Xc{c) 
for almost all random graphs G^^c/n- 

Next we are going to present some previously derived rigorous bounds on xdc). 
A general one for arbitrary, i.e. non-random graphs G was given by Harant j24| who 
generalized an old result of Caro and Wei |23 . Translated into our notation, he showed 
that 

-c(G)<l-4 i'^^^v^) 

l^ieV di + l l^{i,j)eE (di + l)(dj + l) 

where di is the connectivity (or degree) of vertex i. This can easily be converted into 
an upper bound on Xc{c) which holds almost surely for TV — > oo. 

The vertex cover problem and the above-mentioned related problems were also 
studied in the case of random graphs, and even completely solved in the case of infinite 
connectivity graphs, where any edge is drawn with finite probability p, such that the 
expected number of edges is p(^) — 0(A^^). There the minimal VC has cardinality 
(iV— 2 lni/(i_p) iV— O(lnlniV)) Bounds in the finite-connectivity region of random 
graphs with N vertices and cN edges were given by Gazmuri ^27, . He has shown that 

a;;(c) < a;c(c) < 1 - — (3) 
c 

where the lower bound is given by the unique solution of 

Q = xi{c)\nxi{c) + {\~xi{c))\n{\~xi{c))~^{\~xi{c)f . (4) 

This bound coincides with the so-called annealed bound in statistical physics. The 
correct asymptotics for large c was given by Frieze j28j : 

Xc{c) = 1 - -(Inc- lnlnc+ 1 - ln2) + o (5) 

with corrections of o(l/c) decaying faster than 1/c. 

Few studies have investigated VC on other ensembles of graphs. They are 
reviewed in Sec. |7| 



3. Algorithms 

There are two types of algorithms: incomplete and complete ones. Complete 
algorithms guarantee to find the optimum or true solution, hence the solution space 
is searched in principle completely. For incomplete algorithms, it is not ensured that 
the true solution or the global optimum is found. But they are very often sufficient 
for practical applications. 
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3.1. Incomplete Algorithms 

First, we present a greedy heuristic for finding small vertex covers, i.e. approximation 
for the solutions of problem PI. The basic idea of the heuristic is to cover as many 
edges as possible by using as few vertices as necessary. Thus, it is favorable to cover 
vertices with a high degree. This step can be iterated, while the degree of the vertices 
is adjusted dynamically by removing edges and vertices which are covered. This leads 
to the following algorithm, which returns an approximation of the minimum vertex 
cover V^c, the size |T4c| is an upper bound of the true minimum vertex-cover size: 

algorithm min-cover(G) 
begin 

initialize Vvc — 0; 

while there are uncovered edges do 
begin 

take one vertex i with the largest current degree di] 
mark i as covered: T^c — Vvc U {i}; 
remove all incident edges {i,j} from E; 
remove vertex i from V] 
end; 

rcturn(F„c); 
end 

It is easy to invent examples where the heuristic fails to find the true minimum 
VC, e.g. a star graph having one center vertex to which k > 2 arms of length 2 are 
attached. 

This most simple heuristics has been generalized by one of the authors within the 
framework of a random vertex selection |29( , which is characterized by a parameter k 
called depth. Each vertex i is selected with a probability Wj,(^i) which depends on the 
(current) degree d{i) of the vertex. Then, within the generalized heuristic, a subgraph 
Q^^)(i) = (y^'^^i), E^'''> (i)) is taken, where y'^'^^(z) contains all vertices which have at 
most chemical distance k from i. Here the chemical distance of two vertices j and i 
counts the number of edges of the shortest path from i to j. E'^'^^ (i) contains the edges 
connecting the vertices from V^^\i). Then G^^\i) is covered starting by covering all 
vertices with distance k from i and then iteratively selecting vertices j among the 
remaining with maximal distance from i, uncovering j and covering all neighbors of j. 
The results of an analysis of the dynamics of this algorithm are reviewed in Sec. 16.31 

The special case k = 1 and wa — ^ has been analyzed by Gazmuri j27| for deriving 
the bound (jS)). The greedy heuristic presented before corresponds to the case fc = 
and Wd = Sd,d^^^, where c?max is the current maximum degree in the graph. This case, 
where Wd is dynamically adjusted, have not be analyzed so far. 

An alternative are incomplete algorithms based on conventional Monte Carlo 
(MC) simulations in the grand-canonical ensemble, characterized by a chemical 
potential /x. Here we present a variant where one restricts the dynamics to 

true covers and allows movements of the covering marks as well as fluctuations of the 
size of the cover. First one selects an initial configuration, for example by using the 
above heuristics or by covering all vertices. For each MC step, a vertex i is selected 
randomly. With probability p (e.g. p — 0.5) a MOVE (M) step is performed, and 
with probability 1 - p an EXCHANGE (EX) step: 
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M If vertex i is covered and has exactly one uncovered neighbor, the covering mark 
is moved to the neighbor. In aU other cases, the configuration remains unchanged. 
EX If the site is uncovered, a covering mark is inserted with probabihty exp(— /i). 
If the site is covered, and all neighboring sites are covered, the covering mark is 
removed from i. 

Note that in this way detailed balance is fulfilled. Ground states, i.e. minimum-size 
vertex covers can be obtained by starting with a small chemical potential, which is 
slowly increased. The chemical potential thus plays the same role in the algorithm as 
the decreasing temperature in simulated annealing j3| . Like the latter algorithm, MC 
simulations can reach a globally optimal vertex cover only on exponential time scales. 
On the other hand the Monte Carlo approach allows to study dynamic properties of 
the model, which can be regarded as a hard-core lattice gas [HSl, see also below. 

The efficiency of randomized incomplete algorithms can be increased by 
introducing restarts The basic idea is to let the randomized algorithm run for a 
fixed number AT of steps. If no solution is found in this time, the algorithm is restarted 
from the beginning but with a different seed of the random number generator. The 
basic idea behind this concept is that during a run the system may be trapped in 
a local minimum, hence the chance of finding a solution is increased when starting 
again. 

3.2. Complete Algorithms 

Next, we present two complete algorithms: They guarantee to find the exact answer, 
even if the time required will, in general, grow exponentially with the graph size. 

First we turn to the problem, where we are interested only in minimum-size vertex 
covers (problem PI). Since each vertex can be either covered or uncovered, the most 
direct approach is to enumerate all possible 2^ configuration, store all those being 
VCs, and finally select one of those having minimal VC cardinality. Obviously, the 
time-complexity of this approach is 0(2^). Early attempts [221 ESI have the same 
worst-case running time. The approach of Tarjan and Trojanowski '34^ presented 
here has an 0(2^/^) 

time complexity. It uses a divide-and-conquer approach. First, 
all connected components of the graph are obtained. Then the minimum-size vertex 
covers for all components are calculated separately by recursive calls. The treatment 
of each connected component is based on the following idea. Let i G V a vertex, 
A{i) C V its neighbors in G and for any subset 5 C let G{S) = {S,E{S)) the 
subgraph induced by S, i.e. E{S) — E n {S x S). Then the minimum-size vertex 
cover is either {i} combined with the minimum-size vertex cover of G{V\ {i}) or A{i) 
combined with the minimum-size vertex cover of G{V \ {i} \ A{i)). 

Furthermore, the algorithm uses the concept of domination. This means basically 
that one considers small subgraphs S. Among all possible VCs of the subgraph one 
disregards all those, which provably cannot lead to better VCs of the full graph - 
mainly because they cover only few or none of the edges connecting vertices from S 
to V \ S. We explain the simplest example for domination. In this case leaves are 
dominated, i.e. vertices i having only one single neighbor j. Here, for a minimum-size 
vertex cover one must cover either i or j. Since i has only one neighbor, but j may 
have more, we can immediately cover j and remove the vertices i,j and all incident 
edges. This is the basic idea of the leaf-removal algorithm of Bauer and Golinelli 
Note that this corresponds to the case depth k = 1, Wd = 5d,i of the generalized 
heuristic discussed in the last section. 
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The full algorithm is still deterministic but more general than leaf removal: For 
each connected component, the vertex iq having the smallest degree is determined. 
Degree dig = corresponds to an isolated vertex, which is not covered. Degree dig = 1 
corresponds to a leave which is treated as discussed above. Furthermore, the algorithm 
treats explicitly the cases of degree dig =2,3 and 4. For higher lowest degrees dig > 4, 
basically the subproblems for ig covered and io uncovered must be treated completely. 
But during the recursive calls generated in this way, the cases with smaller minimum 
degree might appear again. The full detailed five page presentation of the algorithm 
with all cases and subcases can be found in Ref. [SJ. Due to the application of 
domination the algorithm runs faster but it is unable to find more than one minimal 
VC, hence it cannot be used to enumerate all solutions. 

A simpler to implement algorithm |3f)j exhibits a worse time complexity 
0(2"/^'^^'^), but the authors claim that within their computer experiments it was 
faster than the method of Tarjan and Trojanowski. 

If one is not only interested in one single minimum VC but in enumerating all, the 
divide- and-conquer method does not work and branch-and-bound approaches |37l I38| 
must be applied. Also for the case where the number of covering marks X is given and 
one looks for all configuration of minimum energy (problem P3), a branch-and-bound 
method is feasible. We will present an algorithm for this latter case. The algorithm 
enumerating all minimum-size VCs (problem PI) works in the same spirit. 

The branch-and-bound approach differs from the previous method by the fact 
that the concept of domination cannot be used. The basic idea is to build the full 
configuration tree. While doing this, the algorithm makes certain choices where to 
put covering marks. If no VC of the desired size is found, some covering marks have 
to be removed and to be placed elsewhere, i.e. the algorithm has to backtrack. This 
is done in a systematic way allowing to investigate the full configuration space. This 
0{2^) running time is reduced by omitting subtrees of the full tree by using a hound: 
Trees where for sure no minimum- energy configuration is located can be omitted. The 
bound applied in the following algorithm uses the cwrreni vertex degree d{i), which is 
the number of uncovered neighbors at a specific stage of the calculation. By covering 
a vertex i the total number of uncovered edges is reduced by exactly d{i). If several 
vertices ji, j2, ■ ■ ■ ,jk are covered, the number of uncovered edges is at most reduced 
by d{ji) -\- d{j2) -I- ... -I- d{jk)- Assume that at a certain stage within the backtracking 
tree, there are uncov edges uncovered and still k vertices to cover. Then a lower bound 
M for the minimum number of uncovered edges in the subtree is given by 



The algorithm can avoid branching into a subtree if M is strictly larger than the 
number opt of uncovered edges in the best solution found so far. For the order the 
vertices are selected to be (un-)covered within the algorithm, the following heuristic 
is applied: the order of the vertices is given by their current degree. Thus, the first 
descent into the tree is equivalent to the greedy heuristic presented before. Later, it 
will be become clear from the results that this heuristic is indeed not a bad strategy. 

The following representation summarizes the algorithm for enumerating all 
configurations exhibiting a minimum number of uncovered edges. Let G — (V, E) 
be a graph, k the number of vertices to cover and uncov the number of edges to cover. 
Initially k = X and uncov — \E\. The variable opt is initialized with opt — \E\ and 
contains the minimum number of uncovered edges found so far. The value of opt 



M = max 0, uncov — max d{ji) + . . . + d{jk) 



(6) 
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is passed via call by reference. At the beginning all vertices i £ V are marked as 
free. The marks are considered to be passed via call by reference as well (not shown 
explicitly). Additionally it is assumed that somewhere a set of (optimum) solutions 
can be stored. 

algorithm min-cover(G, fc, uncov, opt) 
begin 

if k=0 then {leaf of tree reached?} 
begin 

if uncov < opt then {new minimum found?} 
begin 

opt := uncov; 

clear set of stored configurations; 
end; 

store configuration; 
end; 

if bound condition is true (see text) then 
return; 

let i £ V a vertex marked as free of maximal current degree; 

mark i as covered; 

k:=k-l; 

adjust degrees of all neighbors j of i: d{j) := d{j) — 1; 
min-cover(G, k, uncov — d{i), opt) {branch into 'left' subtree}; 
mark i as uncovered; 
fc := fc + 1; 

(re)adjust degrees of all neighbors j of i: d{j) :— d{j) + 1; 
min-cover{G,k, uncov, opt) {branch into 'right' subtree}; 
mark i as free; 
end 

In the actual implementation, the algorithm does not descend further into the 
tree as well, when no uncovered edges are left. In this case the vertex covers of the 
corresponding subtree consist of the vertices covered so far and all possible selections 
of k vertices among all uncovered vertices. 

Finally we note that using the concepts of restarts one can also turn a complete 
backtracking algorithm into a (possibly) faster incomplete one. An application to VC 
has been studied by Montanari and Zecchina 39 . The algorithm must be randomized, 
for applying restarts. Hence the choice which vertex is treated next is performed in 
some random way, similar to the generalized heuristic presented above. By applying 
many restarts, rare events become important: On one hand, the latter may have 
exponentially smaller search trees, i.e. in this case the algorithm by chance does not 
need to backtrack as long as usually. On the other hand, events of this type are 
exponentially rare. Balancing the exponential gain due to the smaller search tree 
against the exponential loss due to large number of restarts required to find such an 
event, an optimal backtracking (i.e. running) time per restart can be found. The 
analysis of a restart algorithm for VC 39^ is reviewed in Sec. 16.21 
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First, the VC variant is considered where the energy is to be minimized for fixed values 
X = X/N (problem P3). We know that for small values of x, the energy density Q 
is not zero [e{x = 0) = E/N = c/2], i.e. no vertex covers with xN vertices covered 
exist. On the other hand, for large values of x, the random graphs are almost surely 
coverable, i.e. e{x) — 0. In Fig. [3 the average ground-state energy density and the 
probability Pcov (a;) that a graph is coverable with xN vertices are shown for different 
system sizes N = 25, 50, 100. We consider her the average connectivity c = 2.0, but 
qualitativley equivalent results are found for other values of c too. The results [401 141) 
were obtained using the branch-and-bound algorithm presented in the last section. 
The data are averages over 10'^ {N = 100) to 10^ {N = 25, 50) samples. As expected, 
the value of Pcov(2^) increases with the fraction of covered vertices. With growing 
graph sizes, the curves become steeper. This indicates that in the limit N — s- oo, 
which we are interested in, a sharp threshold Xc ~ 0.39 appears. Above Xc a graph is 
coverable with probability tending to one in the large- A'^ limit, below Xc it is almost 
surely uncoverable. Thus, in the language of a physicist, a phase transition from 
an coverable phase to an uncoverable phase occurs. It is frequently denoted as the 
cov-uncov transition. Note that the value Xc of the critical threshold depends on 
the average connectivity c. The result for the phase boundary Xc as a function of c 
obtained from simulations is shown later on. 




Figure 2. Probability Pcov{x) that a cover exists for a random graph (c = 2) 
as a function of the fraction x of covered vertices. The result is shown for three 
different system sizes A'^ = 25, 50, 100 (averaged for 10^ — 10* samples). Lines are 
guides to the eyes only. In the left part, where the Pcov is close to zero, the energy 
average e (see text) is displayed. The inset enlarges the result for the energy in 
the region 0.3 < a; < 0.5. 

In Fig. 13 the median running time of the branch-and-bound algorithm is shown 
as a function of the fraction x of covered vertices. The running time is measured in 
terms of the number of nodes which are visited in the backtracking tree. Again graphs 
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with c = 2.0 were considered and an average over the same realizations as before has 
been performed. A sharp peak can be observed near the transition x^'. The hardest 
instances are typicaUy found in the vicinity of the phase transition. Note that also 
for values x < Xc the running time increases exponentially, as can been seen from the 
inset of Fig.O For values x considerably larger than the critical value Xc, the running 
time is linear. The reason is that the heuristic is already able to find a VC, i.e. the 
algorithm terminates after the first descent into the backtracking tree§. 
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Figure 3. Time complexity of tlie vertex cover. Median number of nodes visited 
in tlie backtracking tree as a function of the fraction x of covered vertices for 
grapli sizes N = 20,25,30,35,40 (c = 2.0). The inset shows the region below 
the threshold with logarithmic scale, including also data for N = 45, 50. The fact 
that in this representation the lines are equidistant shows that the time complexity 
grows exponentially with A^. 

Note that continuous phase transitions in physical systems are usually indicated 
by a divergence of measurable quantities such as the specific heat, magnetic 
susceptibilities or relaxation times. The peak appearing in the time complexity may 
be considered as a similar indicator, but is not really equivalent, because the resolution 
time diverges everywhere, only the rate of divergence is much stronger near the phase 
transition. 

For small values of x in the uncoverable region, the running time is also faster 
than near the phase transition, but still exponential. This is due to the fact that a 
configuration with a minimum number of uncovered edges has to be obtained. If only 
the question whether a VC exists or not is to be answered, the algorithm can be easily 
improvedjl, such that for small values of x again a polynomial running time will be 
obtained. 

§ The algorithm used here terminates after a full cover of the graph has been found since it is ot 
necessary to enumerate all solutions 
II Set best := initially. 
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5. The phase diagram 

The phase diagram gives the value of the critical threshold Xc{c) as a function of the 
connectivity c. For low connectivities c < 1 almost all vertices are contained in finite 
trees Tk of size k \'22i \'2'6\. Then one can calculate Xc(c) using a cluster expansion, 
i.e. by explicitly calculating Xc{Tk) for small k and weighting the results with the 
contribution of each tree T). to the ensemble of random graphs. In Ref. 0J| this 
expansion has been performed up to tree size k = 7, resulting in very good agreement 
with the numerical data for small connectivities c < 0.3. 

Using a statistical-mechanics approach it is even possible to derive an exact 
solution, which is furthermore valid even beyond the percolation threshold c = 1. 
We will show that this solution is valid up to c = e, where e is the Eulerian constant. 
The statistical-mechanics treatment is presented in the next subsection. In the second 
subsection, we will present the results, compare it to numerical findings and explain 
the structure of the phase diagram as well as the solution space structure, finding four 
different percolation transitions occurring in VC on random graphs. 



5.1. Mapping VC to a hard-core lattice gas 

To study VC using concepts and methods of statistical mechanics, one has to map the 
problem onto a physical system. One possibility is to identify each vertex with an Ising 
spin and the two states covered/uncovered correspond to the two spin orientations ±1 
PU] . Then the system can be studied in the canonical ensemble and the natural choice 
for the Hamiltonian is to identify the energy with the number of uncovered edges . 

Here we present a different mapping, using the equivalence between VC and a 
hard-core lattice gas 02] • Any subset U C V oi the vertex set can be encoded 
bijectively as a configuration of N binary occupation numbers: 

if ^e[/ 



1 if i(^U 

The strange choice of setting Xi to zero for vertices in U becomes clear if we look to 
the vertex-cover constraint: An edge is covered by the elements in U iff at most one 
of the two end-points has x = 1. So the variables Xi can be interpreted as occupation 
numbers of vertices by the center of a particle. The covering constraint translates into 
a hard sphere constraint for particles of chemical radius one: If a vertex is occupied, 
i.e. Xi — 1, then all neighboring vertices have to be empty. We thus introduce a 
characteristic function 

x{xi,...,xn) ^ Y[ {l-XiXj) (8) 

which equals one whenever x = {xi, xn) corresponds to a vertex cover, and zero 
else. Having in mind this interpretation, we write down the grand partition function 



E 



exp ( A*E^M 

{xi=0,l} V i / 

with /i being a chemical potential which can be used to control the particle number, 
or the cardinality of U. 

For regular lattices, this model is well studied as a lattice model for the fluid-solid 
transition, for an overview and the famous corner-transfer matrix solution of the two- 
dimensional hard- hexagon model by Baxter |43|. Recently, lattice-gas models with 
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various kinds of disorder have been considered in connection to glasses 03 1^ OH] 
and granular matter gTI EHl Hi EOl ED E21 ■ 

Denoting the grand canonical average as 

(/(f)>^ = S-i J2 expf^^xj x(2?) /(a') (10) 

{a:i=0,l} V i ) 



we can calculate the average occupation density 

9 InS 




(11) 



Minimal vertex covers correspond to densest particle packings. Considering the 
weights in lO, it becomes obvious that the density ^{fj,) is an increasing function of the 
chemical potential fi. Densest packings, or minimal vertex covers, are thus obtained 
in the limit fj, oo: 

Xcic) = 1 - lim v{n) . (12) 

The main step within the statistical-mechanics approach is to calculate the grand 
partition function l^. Here we state only the main steps of the calculation without 
showing intermediate stage results, details can be found in Ref. 021 • The results of 
Fig. 13 indicate that the model becomes self-averaging in the thermodynamic limit, 
i.e. densities of thermodynamic potentials are expected to become independent on the 
specific choice of the quenched disorder (the edge set E). Technically we thus have 
to calculate the disorder average of the thermodynamic potential, or the logarithm of 
the partition function. The latter can be calculated using the the replica trick [IS, . 

h[E ^ lim flLli. (13) 

n^O n 

where the over-bar denotes the disorder average over the random-graph ensemble with 
fixed average connectivity c. Taking n to be a positive integer at the beginning, the 
original system is replaced by n identical copies (including identical edge sets). In this 
case, the disorder average is easily obtained, and the n — *■ limit has to be achieved 
later by some kind of analytical continuation in n. The properties of the model can 
be derived from the 2" order parameters |53 | 

^(^^^Ell'^e..? (14) 

i a 

which give the fraction of vertices having the replicated occupation number Xi = ^ G 
{0, 1}". Using this order parameter, we rewrite the partition function as a functional 

integral over all possible normalized distributions c(^), — l) • This integral 

can be evaluated using the saddle-point method, i.e. one has to optimize over all 
possible normalized functions c(^). This cannot be performed in full generality, hence 
one has to make an ansatz for c(^). 

The simplest possibility is the so-called replica-symmetric (RS) ansatz, which in 
our case assumes that the order parameter c(f) depends on only via X^a^^' 
different replicas cannot be distinguished, and the full permutation symmetry of the 
n replicas is unbroken also on the order-parameter level. This leads to a specific 
representation of c(^) for which the replica limit n can be taken. The resulting 
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saddle-point equation can now be solved analytically in the limit of the chemical 
potential /i — > oo. The results are presented and discussed in the next subsection. 

Before doing this, let us discuss the validity of the replica-symmetric ansatz. As 
it turns out 02 by considering the local stability of the corresponding saddle-point 
solution, this ansatz is valid up to average graph connectivities c < e. At this point 
full replica symmetry breaking (RSB) occurs: Whereas the solution space has a simple 
geometrical structure below c = e, where all solutions are collected in a single cluster 
in configuration space, a hierarchical splitting into many solution clusters appears 
continuously at this breaking point. 

Despite many efforts, the technical problem of handling RSB in finite-connectivity 
systems is still open. Most attempts |^ ISSJ 157] try to apply the first step of 
Parisi's RSB scheme (IRSB) JHl which, however, is technically well-understood only 
in the case of infinite-connectivity spin glasses. Due to a more complex structure 
of the order parameter in finite connectivity systems, a complete analytical solution 
is still missing. Recently, based on the connection to combinatorial optimization, 
the interest in this question was renewed |53| . and some promising approximation 
schemes [SHI have been developed. Even more recently, a break-through was 
obtained in context of the cavity method Being more involved than the replica 
method in infinite-connectivity systems, the cavity approach becomes very elegant for 
finite connectivities. It allows for a straight-forward derivation of self-consistent order- 
parameter equations at a level, which is equivalent to IRSB, and these equations can 
be efficiently solved numerically using a population dynamical algorithm. The cavity 
method has been recently [SHI apphed to VC by Zhou. He found that, although IRSB 
reproduces the numerical results above c = e much better than the replica symmetric 
solution and satisfies numerically the bounds presented in Sec. 12.31 fsee below), the 
1-RSB solution is still not correct above c = e. Full RSB has to be included, which is 
a completely open technical issue. For this reason, we refer the reader to Refs. | 42[I59| 
for the technical details and proceed with the presentation of the results, mainly for 
RS. 



5.2. Phase boundary and percolation transitions 

In this section, we describe the analytical results of the statistical mechanics treatment, 
compare it to numerical simulations and discuss the morphology of the phase diagram 
which can be characterized by the occurrence of four percolation transitions. 

For the density in the limit of infinite chemical potential one obtains for the RS 

case 

2W{c) + Wjc)^ 
^ 2-c ' 

^ — ^oo 

where W(c) is the Lambert-M^-function defined by W{c) exp{W{c)) = c. This 
translates to a minimal vertex-cover size given by 

2c 

To calculate the phase boundary numerically, it is sufficient to construct a single 
minimal vertex cover. Hence one can apply the divide- and- conquer algorithm or the 
version of the branch-and-bound algorithm where X is not fixed. To compare with the 
analytical results one has to perform the thermodynamic limit N ^ oo numerically. 





Figure 4. Phase diagram: Fraction Xc(c) of vertices in a minimal vertex cover 
as function of the average connectivity c. For x > Xc{c), almost all graphs 
have vertex covers with xN vertices, while they have almost surely no cover 
for X < Xc(c). The solid line shows the replica-symmetric result. The circles 
represent the results of numerical simulations. Error bars are much smaller than 
symbol sizes. The upper bound of Harant is given by the dashed line, the bounds 
of Gazmuri by the dash-dotted lines. The vertical line is at c = e. Inset: All 
numerical values were calculated from finite-size scaling fits of Xc{N,c) using 
functions Xc{N) = Xc + aN~^. We show the data for c = 2.0 as an example. 



This can be achieved by calculating an average value Xc{N) for different graph sizes 
N, as it is shown for c — 2.0 in the inset of Fig. 01 Using the heuristic fit function 

Xc{N) = Xc + aN-'' (17) 

the value of Xc(oo) = Xc can be estimated from numerical data for finite graphs. As 
can be seen from the inset, the fit matches well. 

In Fig. 01 this result is compared to numerical simulations 0D] . Extremely good 
coincidence is found for small connectivities c < e. Up to this value however, we expect 
the replica-symmetric result to be exact. This is astonishing, as the solution does not 
show any signature of the graph-percolation transition of the underlying random graph 
at c = 1. Please note that due to the application of statistical mechanics methods like 
the replica trick and the replica-symmetric ansatz, the treatment presented above is 
not mathematically rigorous. Anyway, for c < e, the result (|15|l was recently proven 
to be exact j6(J) in a constructive way by analyzing a specific VC algorithm. For 
c > e systematic deviations between the numerical data and the RS estimate (|16|) are 
visible. For large c, Eq. H16|l even violates the bounds given in section 12.31 and the 
exactly known asymptotics (jSJ, this is due to the appearance of RSB. 

The results ^9 of the cavity-method (corresponding to IRSB) (not shown) are 
better than the RS solution since they are numerically compatible with the asymptotics 
of Eland within the bounds of Eqs. Q,®. But still the IRSB solution is significantly 
different from the numerical extrapolations in the region c > e. 

An important quantity for the understanding of the phase diagram is the so called 
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backbone: Usually the minimal VCs are exponentially numerous. Some vertices are 
therefore covered in some solutions, but they are uncovered in other solutions. But 
there are other vertices having the same state in all solutions, being either always 
covered or always uncovered. These vertices are frozen in a physical sense. These 
vertices are called backbone vertices, we may distinguish two different types due to 
the two possible covering states. From the replica symmetric solution, one can read 
of the off immediately ^2 the fractions of vertices belonging to these two backbone 
types: 

W{c) 



buncov (c) 
bcov{c) = 1 



W{c) + W{cf 



(18) 



The resulting total fraction of backbone vertices of minimum-size VCs is shown in 
Fig. |31 Numerically, the backbone can be calculated by enumerating all minimum- 
size vertex covers of each realization for different sizes N and extrapolating for — > oo 
in a similar fashion like Eq. (|17|) . For c < e again a very good agreement is visible. For 
c > e, the failure of the RS approach is here even better visible than when studying the 
threshold Xc (c) . Also two results obtained within the 1-RSB approach (using different 
ansatzes) are shown, but they deviate even stronger from the numerical results. 




Figure 5. The total backbone size buncov (c) + bcov (c) of minimal vertex covers as 
a function of c. The solid line shows the replica-symmetric result, the dotted ones 
are the two results of one-step RSB. Numerical data are represented by the error 
bars. They were obtained from finite-size scaling fits similar to the calculation for 
Xc{c). The vertical line is at c = e where replica symmetry breaks down. 

A detailed analysis 021 shows that vertices having a small degree are usually 
uncovered backbone vertices, while the high-degree vertices usually form the covered 
backbone. This justifies a posteriori the use of heuristic algorithm presented in Sec. 

Further results can be obtained when studying the subgraphs induced by the 
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backbone and the non-backbone 021 ■ It turns out that the structure of the non- 
backbone graphs in the low connectivity regime c < e can be described as having 
a collection of pairs, which are the simplest graphs having no backbone, as building 
blocks. These pairs are connected by additional random edges, see e.g. Fig. |S1 The 
non-backbone subgraphs show a percolation transition at Cbb = exp(l/V2)/V2 with 
1 < Cbf, < e. Hence the onset of RSB at c = e cannot be explained by this percolation 
transition. A similar study for the backbone subgraphs shows that it percolates already 
at the original percolation threshold c — 1. 



O — O 



o — o o — o o o — o — o 



Figure 6. Examples of smallest non-backbone graphs. Note that all this 
graphs can be divided into connected vertex pairs and some supplementary edges 
connecting different pairs. A similar structure is found also for the full non- 
backbone subgraph at connectivities c < e. 

Nevertheless, Bauer and Golinelli have indeed related the onset of RSB to a 
fourth percolation transition j35| . They have applied the leaf-removal algorithm to find 
minimum-size VCs. The remaining graph is denoted as the core of the graph. Bauer 
and Golinelli find that, below c = e, the core splits into small disconnected components 
of logarithmic size, while above c = e the core percolates and unifies a finite fraction 
of all vertices in its largest connected component. Hence, core percolation seems to 
be responsible for the onset of RSB! 



6. Analyzing algorithms 

In theoretical computer science the time complexity of an algorithm is defined as the 
asymptotic {N — > oo) worst-case running time measured on a model computer. In 
real-world applications one is usually not confronted with this worst case, but with 
some kind of typical case. As we have seen in Sec.^jthere might be regions in parameter 
space (i.e. graph connectivity and VC size in our case), where VC is typically solved 
in polynomial time, while it is typically hard for other parameter regions. Hence, 
one would like to observe the easy-hard transition between these regions within an 
analytical analysis as well. This would allow for a better understanding of the 
underlying mechanisms, hence a step towards finding the source of computational 
hardness. We will show that also here a statistical mechanics treatment, in particular 
the knowledge of the phase diagram as calculated before, leads to some interesting 
insight. 

First, we present the average-case analysis of a simple branch-and-bound 
algorithm for the decision problem P2. Within the algorithm a simple heuristics 
is used to select the next vertex to treat. Next, it is outlined how fluctuations and the 
influence of restarts can be incorporated into the analysis. In the third subsection the 
analysis of generalized linear-time heuristic algorithms is summarized. 
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6.1. Analysis of a simple branch- and-bound algorithm 

The algorithm midcr consideration is a simplified version of the algorithm presented in 
Sec. 13.21 The reason for this simplification is that it allows for an analytical approach. 
In the course of the developments of more sophisticated methods during the next years 
which are based on the basic understanding gained for simple algorithms, it should be 
possible to analyze more elaborated algorithms, too. 

The simplified branch-and-bound algorithm does not use the greedy heuristic, 
instead the vertices are selected randomly among the free vertices. Please note 
that this corresponds to the case Wd = 1 in the generalized heuristic of Sec. 13.11 
Furthermore the depth fc = is used, i.e. when uncovering a vertex, its neighbors are 
not covered immediately. This is also necessary for simplifying the analysis. Finally, 
a simpler bound is used: The algorithm continues to branch into subtrees as long as 
covering marks are available and as long no vertex cover has been found. 

The type of analysis presented here was first applied to the 3-SAT problem 
by Cocco and Monasson [HI]. The application to VC is presented in Ref. 
The analysis of the algorithm consists of two parts: first, the analysis of the first 
descent into the tree and, second, the calculation of full running time, which includes 
backtracking if no cover was found in the first descent. The running time is measured 
in terms of the number of nodes visited in the backtracking tree. 

The first descent into the tree: Previously, probabilistic analysis of descent 
algorithms have been applied to establish rigorous bounds on phase boundaries 
|(j3l ini 1101 122] . The analysis of the first descent into the backtracking tree is straight 
forward for the algorithm presented here, as it forms a Markov process of random 
graphs. In every time step, one vertex and all its incident edges are covered and can 
be regarded as removed from the graph. As the order of appearance of the vertices 
is not correlated to its geometrical structure, the graph remains a random graph. 
After T steps, we consequently find a graph Gn^t,c/n having N — T vertices. As 
the edge probability remains unchanged, the average connectivity decreases from c to 
(1 - T/N)c. 

For large iV, it is reasonable to work with the rescaled time t = T/N, which 
becomes continuous in the thermodynamic limit. In this notation, our generated 
graph reads G(i-t)N.c/N ■ An isolated vertex is now found with probability (1 — 
c/Af)^^~*^^~^ ~ exp{ — (1 — t)c\, so the expected number of free covering marks 
becomes X{t) = X — N dt [1 — exp{— (1 — t )c}). The first descent thus describes 
a trajectory in the c — x-plane, 

c{t) = (1 - t)c (19) 

x-t e-'-^-^^" - e-" 

x(t) = H z ^ . 

^' 1-t (l-i)c 

The results are presented in Fig.[7| One observes a perfect agreement of the analytical 

result and the trajectory generated for a large graph. 

Analysis of the full algorithm: To understand how the algorithm works, we study 

the trajectories together with the phase diagram. We can observe three regions, the 

shape of the search tree is schematically represented in Fig. [S] 

I Easy and coverable: The algorithm works in linear, i.e. polynomial time, if the 
first descent already finds a VC. This is the case for large starting value of x. 
Then x{t) reaches the value one at a certain rescaled time t < 1, and the graph 
is proven to be coverable after having visited tN nodes of the backtracking tree. 
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Figure 7. Trajectories of tiie first descent in tlie (c, x) plane. Tlie full lines 
represent the analytical curves, the symbols numerical results of one random 
graph with 10^ vertices, c = 2.0 and x = 0.8, 0.7, 0.6, 0.5 and 0.3. The 
trajectories follow the sense of the arrows. The dotted line Xi,(c) separates the 
regions where this simple algorithm finds a cover from the region where the method 
fails. No trajectory crosses this line. The long dashed line represents the true 
phase boundary Xc{c), instances below that line are not coverable. 

The critical value xt,{c) above this happens can be obtained from H19|l by setting 
x(t) — 1 and resolving with respect to x in the limit 

Xb{c} = l + - — (20) 

c 

II Hard and coverable: For Xc{c) < x < Xh{c) the graph is typically coverable, but 
during the first descent x{t) vanishes already before having covered all edges. 
The trajectory crosses the phase transition line at a certain rescaled time t at 
(c,i). There the generated random subgraph of N = {1 — i)N vertices and 
average connectivity c becomes uncoverable by the remaining xN covering marks. 
To determine that the generated subproblem is not coverable, the algorithm has 
basically to visit the full backtracking tree for the subproblem. Hence, exponential 
solution times have to be expected. This means Xh{c) > Xc{c) denotes the easy- 
hard transition of the algorithm. After backtracking the region of the uncoverable 
subproblem, the algorithms proceeds until a solution is found. 

Ill Hard and uncoverable: For x < Xc{c), the graph is typically uncoverable. Thus, 
again the algorithm has to build a full backtracking tree until it is proven that 
no VC exists. Hence, again the running time is exponential. 

For more sophisticated algorithms, also a phase IV can appear which is easy and 
uncoverable. This happens if the used bound is able to prove already in the very 
beginning that no VC of the allowed size exists, and no exponential backtracking is 
required. The simple algorithm considered in \<ci2\ does not show this phase. 
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Figure 8. Shape of the backtracking tree in the three dynamical regions, 
contradictions are denoted by "C", solutions by "S": In I, the heuristic 
immediately finds a solution, no backtracking is required. In II, the heuristic fails, 
the algorithm has to backtrack. It has to go back to the tree level, where the first 
uncoverable sub-instance was generated. In III, the graph in uncoverable with 
the given number of covering marks. The algorithm has to backtrack completely. 



To calculate the running time of the algorithm one has to calculate the size of 
the backtracking tree generated during the calculation. This size is determined by 
the numbers N, c and x characterizing the uncoverable subproblem which is typically 
generated. This calculation can be performed using an annealed approximation and 
by applying a saddle-point argument (i.e. the running time is exponentially dominated 
by the largest uncoverable subproblem generated). Details can be found in Ref. [^ . 
The result is displayed in Fig. |5| where it is compared with numerical simulations. 

Note that the algorithm exhibits a peak of the running time exactly at the phase 
boundary. This can be directly understood by looking again at Fig. [3 For x > Xc{c) 
the uncoverable subproblems, which have to be backtracked fully, are smaller than the 
full graph. For x < Xc{c), the number of covering marks is so small that the generated 
backtracking trees are smaller due to the trivial bound included in the algorithm. 
Thus, directly at the phase boundary the size of the backtracking tree is maximal. 

6.2. Fluctuations and random restarts 

In the analysis summarized above, the algorithm was assumed to follow the typical, or 
average, trajectory in phase space, and the generated subproblems become uncoverable 
exactly when the trajectory crosses the cov-uncov phase boundary. These assumptions 
hold with a probability tending to one in the thermodynamic limit, so they are 
perfectly justified if we consider a single run of the algorithm. 

There are, however, exponentially rare deviations from these two assumptions, 
which can be exploited by running the algorithm described above only up to some 
cutoff backtracking time, and restarting it using a new seed for the random-number 
generator if no solution was found. In general we will need exponentially frequent 
restarts of the algorithm, but these can be over-compensated by an exponential time 




Figure 9. Normalized and averaged logarithm r = lnti,t/N of running time tjt 
of the algorithm as a function of the fraction x of coverable vertices. The solid 
line is the result of the annealed calculation. The symbols represent the numerical 
data for N = 12, 25, 50, lines are guide to the eye only. 



gain due to the restricted backtracking time. According to Montanari and Zccchina 
|39j . the relevant rare events are: 

• Also in the uncoverable phase, there exists an exponentially small fraction of 
coverable instances. Following the first descent into the backtracking tree in these 
rare cases, the system will stay coverable up to a point well inside the uncoverable 
phase. The largest generated uncoverable sub-instance will be smaller, and the 
backtracking time consequently shorter. The exponential gain due to the smaller 
backtracking tree has to be balanced against the exponential number of restarts 
needed to find this smaller tree. Analytically, these events can be described in 
a replica calculation generalizing the one which was used to calculate the phase 
boundary. 

• Right from the beginning, the algorithm may follow a different trajectory in 
parameter space, also hitting the phase boundary at a later point. Again, 
macroscopic deviations from the typical trajectory are exponentially rare, but 
they can be exploited by exponentially frequent restarts. This can be understood 
analytically within the path-integral formalism introduced by Montanari and 
Zecchina |39|. by calculating the probability of an arbitrary trajectory (c(t), x{t)) 
starting at (co,a;o). 

Most astonishingly, Montanari and Zecchina 39 found that the optimal time between 
restarts is only linear in N, i.e. that mainly no backtracking is needed, because the 
heuristic is able to find a solution even in the first descent - even if this happens with 
small probability. These analytical results were beautifully confirmed by numerical 
simulations. 

In a more general case |31| this can be different: A non-trivial optimum in the 
restart time can be observed ntimerically for more sophisticated algorithms. 
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6.3. Generalized heuristics 

Within the two analysis presented above only a simple heuristic was considered. 
The generalized heuristic presented in Sec. 13.11 was analyzed by one of the authors 
|29|. again for an ensemble of diluted random graphs characterized by an average 
connectivity c. The concentration of the analysis was laid on the heuristic itself, not 
on the interplay with a backtracking algorithm. The basic idea is similar to the first 
descent analysis presented in the preceding section: one follows the dynamics of the 
algorithm analytically in a suitably chosen parameter space. For the algorithm studied 
in the preceding analysis, the degree distribution pd of the graphs is unchanged for 
all times, i.e it remains the usual random graph distribution (Poissonian). Only the 
average connectivity c{t) is time dependent, leading to a simple differential equation. 
For the generalized analysis the degree distribution itself is time dependent, i.e. one 
obtains an infinite set of differential equations for Pd{t). The other difference is that 
in the preceding section the relative number x of covering marks was given as input to 
the algorithm (problem P3), while in this case the algorithms runs until all edges are 
covered (problem PI). The final result of the analysis gives relative size Xf (c) of the 
resulting VC. This allows to compare different variants of the heuristic: Algorithms 
with smaller Xf (c) perform better. 

The central idea in the improved heuristic is to select vertices according to degree- 
dependent weights Wd ^ d". This allows, e.g., for the preferential selection of high- 
connectivity vertices as used in the complete algorithm described in Sec. 13.21 In 
addition, the inclusion of more than one vertex was allowed by going to depth-fc 
algorithms as already described. The main results of [221 are the following: 

• For depth fc = 0, the algorithmic performance increased with growing a, 
i.e. with a stronger preference to selecting high-connectivity vertices initially. 
Asymptotically, the constructed vertex covers were found to be of size Xf{c) ~ 
1 — 2q;/(c -I- 2a). The correct asymptotics of minimal VCs is reached to leading 
order only in the limit a — + oo, which is the case implemented in Sec. 13.21 

• For higher depth fc > 1, the correct asymptotics is already reached for a — 0, i.e. 
for a completely random selection of vertices. This includes also the algorithm 
studied by Gazmuri |27| . which is characterized by fc = 1 and a = 0. Still, for 
low connectivities the constructed VCs are pretty large compared to the minimal 
ones. 

• The best performance was found for a generalized leaf-removal with Wd ~ 
ASd^i -t- 1. In the limit A ^ 1, this algorithm unifies the perfect result of leaf 
removal for small connectivities c < e with the correct asymptotic performance 
of depth- 1 algorithms. 

For technical details we refer to |29| . 
7. VC on other random ensembles 

So far we have presented result for the ensemble of Erdos-Renyi random graphs |52] . 
VC has recently been studied on two other ensembles, on random graphs with power- 
law distribution for the degrees including correlations between vertex degrees, and on 
graphs where the basic graph-forming elements are cliques. 

Vazquez and Weigt have introduced a generalized Bethe-Peierls approach, 
which allows to study VC and other lattice-gas like models on graphs with arbitrary 
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degree distributions pd- Furthermore correlations Cd.d' between the degrees of 
connected vertices are ahowed: The quantity e^.d' measures the probabiUty that for 
a randomly selected edge, the first end-vertex has degree d, and the second one has 
degree d' . The RS result is evaluated for power-law distributions pd ~ (7 > 2) 
and with correlations Cdd' = qd[rSd,d' + (1 — r)qd'] where qd = {d + l)pd+i/c is the 
probability that for a random edge a vertex attached to the edge has degree d + 1. 
The parameter r can be used to interpolate between the uncorrelated (r = 0) regime 
and the regime where each vertex is only connected to vertices of the same degree 
(r = 1). The analytical result for the threshold Xc{r) is compared with numerical 
results obtained from the application of a generalized leaf-removal. The leaf-removal 
process can be used to determine the onset of RSB: It appears when the number of 
treated vertices having minimal degree larger than 1 during the run of the algorithm 
becomes of order of the graph size. The main result is that for small values of r (e.g. 
r < 0.7 for 7 — 2.5) the problem is always easy, i.e. the leaf-removal finds the correct 
answer. In other words: Uncorrelated power-law graphs are coverable in polynomial 
time. In this region a good coincidence between the analytical and numerical results 
could be observed. Results in the RSB region for large r are not available so far. 

A different approach to obtain hard ensembles is followed in Ref. [SO]. There, 
graphs are constructed from basic units consisting of p-cliques, i.e. fully connected 
subgraphs of p vertices. The full graph is obtained by randomly joining K cliques 
in every vertex. VC on such graphs, or the corresponding lattice-gas model, can be 
solved using the cavity approach. For p,K > 3, a discontinuous IRSB transition 
is found at some VC size being extensively larger than the minimal VC size. This 
means that the problem is computationally hard, even if one would be satisfied with 
a solution of order 0(1) away from the optimum. Furthermore, when studying the 
dynamics using a Monte Carlo algorithm in the grand-canonical ensemble (see Sec. 
I3.1|l . a dynamical transition to a glassy phase appears: The system gets trapped 
in metastable states, and equilibration times are exponentially large in N. For this 
reason, VC on the modified graph ensemble represents a good mean-field model for 
structural glass formers. It has, in particular, only two-particle interactions in contrast 
to previous hard-core lattice gas models ISHl EZI EH] for glasses. 

8. Summary and outlook 

We have introduced the vertex-cover problem, which is one of the fundamental 
NP-complete problems in theoretical computer science. We have reviewed different 
incomplete and complete algorithms for solving VC. Although VC is considered to 
be computationally hard, on an ensemble of random graphs, it exhibits an easy- 
hard transition when looking for vertex covers of given size. This make the problem 
very valuable for studies aiming for the understanding of the origin of computational 
hardness. 

Using concepts and methods of statistical physics, many properties of the model 
can be understood which are well beyond the horizon of traditional approaches in 
theoretical computer science. In the low-connectivity region (c < e, i.e. even above 
the percolation threshold c — 1), it is possible to calculate the phase boundary 
exactly using the replica trick or the cavity approach. Above c — e full RSB sets 
in continuously. The morphology of the phase diagram and the onset of RSB can be 
related to different percolation transitions occurring in the graph and in the solution 
space structure of vertex covers. 
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Furthermore it is possible to analyze analytically simple backtracking algorithms 
by following the parameter flow in the phase diagram and to calculate the easy-hard 
transition threshold. It is possible to understand better how an algorithm solves 
a coverable problem by including fluctuations in the analysis. Also more complex 
heuristics, so far without including backtracking, can be analyzed. 

One central point of the future research will be to study special ensembles of 
graphs, which are very hard to solve. Examples are graphs with correlations or graphs 
having small complete subgraphs. In particular, one is interested in finite-dimensional 
regular graphs (i.e. lattices) exhibiting one-step RSB, which would make them a good 
model for structural glass formers. 

Another direction of the future research will be the analysis of more complicated 
algorithms, i.e. backtracking algorithms with better heuristics or including bounds. 
Finally, the research aims to apply statistical mechanics methods to invent more 
efficient algorithms, similar to the recent development of the survey-propagation 
algorithm by Mezard, Parisi and Zecchina |21j which originates in the cavity approach. 
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